Simple Storage Service

S3 buckets are created in one region.

S3 Security (Resource Policies and ACLs)

There is one point that we always need to remember: S3 is private by default. No one can access a bucket or an object inside an AWS account apart from the root user.

S3 Bucket Policies

S3 bucket poilicies are a type of resource policy attached to a specific bucket. Resource policies are like identity policies, but they can refer ARNs from same account OR different accounts (In an organisation). With resource policy, we can allow/deny anonymous principals.
Resource poilicies always have a key called "Principal" which controls which identity is affected by the resource policy.

ACLs (Access Control Lists)

ACLs are legacy. They should not be used.

Block Public Access

There were a lot of disasters and data leaks happening. While creating a bucket we have to explicitly uncheck the block public access setting to make sure that the buckets can be read by anonymous principals.

S3 Static Website Hosting

By default we can access S3 objects using the AWS API. Static website hosting allows access via HTTP.
Index and Error documents are set and a website endpoint is created.

Apart from hosting static websites like blogs, this feature can be used for the following:

Offloading: Any static data can be put into S3 and dynamic data can be rendered from the compute service that we use.
Out of band pages: Pages like maintainence pages or error pages.

Object Versioning and MFA Delete

Object versioning is disabled by default. It can be enabled through the API, but once enabled, it cannot be disabled again. It can only be suspended.

Versioning lets you store multiple versions of objects within a bucket. Operations which modify objects generate a new version instead of overriding the exisitng version. If versioning is enabled to a specific bucket then an ID is set for each version. Please remember these things:

Cannot be switched off. It can only be suspended.
Space is consumed by all the versions and billing happens according to all versions.
When we enable versioning, we can also enable MFA to change the bucket versioning state or to delete the versions.
Whenever we delete an object in a versioned bucket, the object is not actually deleted. Only a delete marker is added which essentially is a version of the object.

S3 Performance Optimisation

We have a method called Single PUT Upload that delivers data to S3 in a single stream. This kind of transfer is NOT acceptable even for fast networks, let alone for slow ones. Only 5GB of data is allowed. The solution to this is multi-part upload. Multipart upload sends data in a chunked way. 10,000 max parts are allowed. Each part can fail and be restarted indivisually.

S3 transfer acceleration is a way to transfer data using the AWS edge locations instead of using the public internet.

S3 Object Encryption

Buckets are NOT encrypted. Objects are. When we talk about encryption in S3, we are talking about encryption at rest and not encryption that happens during the transit of data (Which is done by default). There are 2 types of encryption that happens on S3:

Client Side Encryption: Here the encryption, key management and compute service cost happens on the client and S3 is used purely for storage of the object.
Server Side Encryption: Here the encryption happens once the data reaches S3 in AWS's server.

Its is important to remember that SSE is mandatory in S3.

There are 3 types of SSE that can happen. Lets discuss them one by one.

SSE-C (Server Side Encryption with Customer Provided Keys)

In this type of SSE, the client is responsible to managing the keys whereas S3 is responsible for performing the cryptographic operations. The client sends the plaintext and the key to the S3 server and the server does the encryption/decryption. They key provided is destroyed by S3 and it only stored a hash version of the key.

SSE-S3

In this type of SSE, S3 is responsible for managing the keys and also the cryptographic operations. For most situations, this is the best kind of SSE to use. The downside to this approach is that you cannot restrict the admin user of S3 to view data. This full and open access might not be ideal for some environments. AES-256 is used as the encryption algorithm.

SSE-KMS (Server Side Encryption with Key Management Service)

Instead of S3 managing keys, KMS manages the keys. We can create customer managed keys to encrypt and decrypt objects in S3 instead of relying on S3 to store this data. This type is perfect for scenarios where role separation needs to be done.

S3 Bucket Keys

In the general case, without bucket keys, each time a put operation happens on an S3 bucket, to encrypt each object S3 has to do an API call to KMS to generate a DEK, which inturn is used to encrypt the data of the object.

S3 bucket keys are way to reduce this operation. For a limited time, a bucket key is generated by KMS and this bucket key is used for encryption within S3 instead of doing API calls to KMS. CloudTrail events do not get logged at object level if we enable bucket keys.

S3 Storage Classes

Storage classes are different types of methods to store S3 objects. Each storage class has a different pricing model. The following are the storage classes.

S3 Standard

99.999999999% availability.
Stored across atleast three AZs.
Should be used for frequently accessed data which is important and non replaceable data.
We are billed for the storage and the requests. No specific retrieval fee.
Has a millisecond of first byte latency.
Can be made publically available.

S3 Standrd IA (Infrequent Access)

Exactly same as S3 Standard. The only difference is that storage cost is almost half of S3 standard but retrieval cost is higher. There is a retrieval fee for this storage class. Should be used for data which is long lived but not frequently accessed.

S3 One Zone IA

Similar to Standard IA but stored in a single AZ. Should be used for data which is long lived but not frequently accessed. Should be used for data which is not critical.

S3 Glacier Instant

Similar to S3 Standard IA but the access of data if even less frequent. The retreival is almost instant though.

S3 Glacier - Flexible Retreival

We need to think of these objects as cold object stored. They cant be made public, neither are they instantly available. First byte latency is minutes to hours.

S3 Glacier Deep Archive

Data in a frozen state. First byte latency is hours or days.

S3 Intelligent Tiering

Intelligent tiering is a way to automate the tier in which an object lives. We have frequent access, infrequent access, archive instant access, archive access and deep archive.

S3 Lifecycle Configuration

Lifecycle Configuration is a set of rules that consists of actions that can be done based on certain conditions.
The actions performed are of 2 types:
- Transition action: Change the storage class.
- Expiration action: Expire or add a delete marker to objects.

If the objects that we are storing in our bucket has a defined lifecycle then setting up the lifecyle configuration makes sense. Minimum of 30 days in S3 standard is required before transition.

S3 Replication

Replication is a way to tell S3 to replicate the objects or buckets created. There are two types of replication that can be done:

Cross Region Replication: In this type of replication, the source and the destination buckets are in different regions.
Same Region Replication: In this type, the source and the bucket destination buckets are in the same region.

For replication in S3, the replication configuration is setup in the source bucket. There are a few options that we have for the configuration:

Destination bucket.
IAM role to assume when replicating.

In same account replication, the IAM role is automatically trusted by the AWS account within which the role is created. For cross account replication, its important to know that the destination bucket needs to attach a bucket policy that trusts the IAM roles of the other account.

Lets talk about some replication options:

Ownership.
If we want to replicate all objects or a subset of objects.
Which storage class the destination bucket should be stored in. By default it maintains the storage class of the source destination.
Replication Time Control. When this is set to true, the replication happens within 15 seconds.

Also, lets talk about the S3 considerations:

By default, replication is not retroactive and versioning needs to be ON. Batch replication can be used to replace existing objects.
One-way replication
Can do replication with any kind of encryption that we use.
Source bucket owner needs permissions to objects.
Glacier and Glacier Deep Archive are not replicated.

S3 Presigned URLs

Let us talk about the problem that S3 presigned URLs try to solve. There are a lot of usecases where an anonymous principal needs access to an object sitting in an S3 bucket. This might be temporary access that is given. There are 3 ways to solve this:

Give the principal an AWS identity.
Give the principal AWS credentials.
Make the bucket public.

None of the 3 are ideal. This is the problem presigned URLs try to solve for.

Most common scenario for Presigned URLs

Suppose I have a client application and a server that provides the backend for this application. I can create an IAM User identity for this application and let this application generate presigned URLs for a specific object that the client wants to access.

Things to know

You can create a presigned URL for an object you have no access to.
When using the URL, the permissions match the identity which generated it.
Access denined could mean the generating ID never had access... or doesn't now.
Don't generate with a role, URL stops working when temporary credentials expire.
Presigned URLs have the authentication and authorization of the identity that generated the URL right NOW.

aws s3 presign s3://animals4lifemedia12121/aotm.jpg --expires-in 60

S3 Select and Glacier Select

There are some usecase where we want only a part of an object instead of the whole object. For example, in cases like logs that are stored, if we want to only get the first 1000 lines of a CSV file. S3 select and Glacier select help us to write SQL like statement to filter the data and send it to the client.

S3 Events

S3 Event notification is a way we can listen to different actions happening on an object or a bucket and react to it. We can listen to actions like:

Object creation.
Object deleltion.
Object restoration from deep archive or glacier storage classes.
Bucket replication.

The way to react to these actions is to either put an event in the SQS queue or an SNS topic or run a lambda function. All these services need to have the correct resource policy attached to it so that S3 as a principal is able to access them.

S3 Access Logs

S3 Access logs can be setup on a source bucket and the logs can be store in a target bucket. It logs the access of the source S3 bucket.

S3 Object Lock

Its a feature of S3 objects in which the object is locked and can be written once and read more than one times. (WORM)

There are different modes in which object lock feature can be enabled:

Retention Compliance => In this mode days or years are set for an object to be locked. Once set, even the account root user cannot change, delete or modify an object untill the retention period expires.
Rentention Governance => Same as Compliance but a little less restrict. You can add special permissions.
Legal Hold => This is just a lock between on and off. Cannot delete or modify an object if its on.

S3 Access Points

Simplifies managing access to S3 buckets and objects.
We try and create multiple access points and each access point can have its own policy and own endpoint address instead of the default S3 address.

aws s3endpoint create-access-point

S3 Security (Resource Policies and ACLs)​

S3 Bucket Policies​

ACLs (Access Control Lists)​

Block Public Access​

S3 Static Website Hosting​

Object Versioning and MFA Delete​

S3 Performance Optimisation​

S3 Object Encryption​

SSE-C (Server Side Encryption with Customer Provided Keys)​

SSE-S3​

SSE-KMS (Server Side Encryption with Key Management Service)​

S3 Bucket Keys​

S3 Storage Classes​

S3 Standard​

S3 Standrd IA (Infrequent Access)​

S3 One Zone IA​

S3 Glacier Instant​

S3 Glacier - Flexible Retreival​

S3 Glacier Deep Archive​

S3 Intelligent Tiering​

S3 Lifecycle Configuration​

S3 Replication​

S3 Presigned URLs​

Most common scenario for Presigned URLs​

Things to know​

S3 Select and Glacier Select​

S3 Events​

S3 Access Logs​

S3 Object Lock​

S3 Access Points​